Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving performance of batchgenerators #113

Open
wants to merge 62 commits into
base: master
Choose a base branch
from

Conversation

ancestor-mithril
Copy link

Thanks you for your work, this is a nice tool for augmenting 3d images.
My changes come to improve the performance speed of various methods, to reduce the cpu time spent doing augmentations.
I've fully vectorized some augmentations and normalizations, while also using inplace numpy operations where applicable. Please ask if you have a question regarding any change.

ancestor-mithril and others added 30 commits May 16, 2023 12:01
caching, optimizing conditionals, using tuples instead of lists, doing operations inplace
Using lru_cache for caching tuple creation
*unittest2 also has errors
* also adding minor improvements to utils functions (reformatting file, using lru_cache where possible)
…erands instead of transposing the higher dimensional ones
pandas unique is faster because it uses hashtable
@FabianIsensee
Copy link
Member

I am starting to review all your changes. There is a lot of stuff, thanks a lot! Might take me a while to do all that. That must have been so much work, wow!

@ancestor-mithril
Copy link
Author

You're welcome!
I added a lot of changes and a big of part is not really relevant or useful, so you can be selective about what you want to include. I hope you find some parts that can be adapted to batchgenerators.
Also, I've been validating my implementation with the unittests and nnUNet pipeline, but I'm not sure I cover all the cases.

@FabianIsensee
Copy link
Member

I am not confident either about how much the unittests cover which is why I would like to go through everything before approving. You have some pretty cool tricks up your sleeve about how you approach things. That's certainly a lot cleaner than the old batchgenerators implementation.
I will also do some integration tests with nnU-Net to see if there is a degradation (or improvement ;-) ) in segmentation performance.
Have you made any performance measurements of your PR vs the current batchgenerators master? For example with the nnU-Net data augmentation as pipeline? Would be interesting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants